Skip to content

Statistical Learning

Probability Distribution

  • Probability that a random variable (r.v.) X takes every possible value.
P(X=xi)=p(xi),i{1,,n}
  • Satisfying:
i=1np(xi)=1,p(xi)0,i{1,,n}

Discrete Random Variable

Probability Mass Function (PMF)

  • Probability Mass Function (PMF) is the probability that the value of r.v. X is xi:

Bernoulli Distribution

  • In a test, event A happens with probability μ, does not happen with probability 1μ.

  • If using r.v. X to indicate the number of occurrences of event A, then X can be 0 or 1. Its distribution is:

p(x)=μx(1μ)1x

Binomial Distribution

  • In the n times Bernoulli distribution, if r.v. X represents the number of occurrences of event A, the value of X is {0,,n}, and its corresponding distribution:
p(X=k)=(nk)μk(1μ)nk,k=1,,n
  • The binomial coefficient represents the total number of combinations of elements taken out of n elements regardless of their order.

Continuous Random Variable

Probability Density Function (PDF)

Probability distribution can be described by the Probability Density Function (PDF) f(x):

+f(x)dx=1

Cumulative Distribution Function (CDF)

The Cumulative Distribution Function (CDF) is the probability that the value of r.v. X is less than or equal to x:

F(x)=P(Xx)
  • For continuous r.v., we have:
F(x)=xf(t)dt

Gaussian Distribution

XN(μ,σ2)P(x)=12πσexp((xμ)22σ2)

Marginal Distribution

Marginal Probability Mass Function

  • X 的边际概率质量函数:

    pX(xi)=jp(xi,yj)
  • Y 的边际概率质量函数:

    pY(yj)=ip(xi,yj)

Marginal Probability Density Function

  • X 的边际概率密度函数:

    fX(x)=+f(x,y)dy
  • Y 的边际概率密度函数:

    fY(y)=+f(x,y)dx

Conditional Probability

For a discrete random vector (X,Y), when X=x is known, the conditional probability of r.v. Y=y is:

p(y|x)=P(Y=y|X=x)=p(x,y)p(x)

Sampling

Sampling: given a probability distribution p(x), generate samples that meet the conditions

x(1),x(2),,x(N)p(x)

Expectation

Expectation: the average of random variable

For discrete r.v. X:

E[X]=n=1Nxnp(xn)

For continuous r.v. X:

E[X]=+xf(x)dx

Law of Large Numbers

When the number of samples is large, the sample mean and the real mean (expectation) are fully close.

Given N independently and identically distributed (I.I.D.) samples

x(1),x(2),,x(N)p(x)

Its mean value converges to the expected value:

XN=1N(i=1Nx(i))E[X] for N